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ABSTRACT 

This stufty discusses latent trait theory applications 
to ^est ites bias methodology, real data set is ased in de.scriblng 
the^rationale and application of the Rasch . probabilistic .aodel l^ei 
calibrations acros&^various ethnic group populations* k high school, 
graduation proficiency test covering reading cotprehension^ writing 

Jsec1tai\lcst and lathesatics vas adainistered to 1^0U2 v^ite and II^UUI 
black students in a lai^ge vest coist school district* Using acOH 
estitation procedures for itei dlff ic'ultiies^ itei idiots for each 
ethnid group by the three separrate subtests vere prepared* The 
deprivation of acceptable tolerance liiits is described .and applied to 

Jbhe current^data set^'vherein a biased itei £s revealed^ The 
iath&«atlcs are given although thei^r derivation is not described 
except vheh required for coipteteness* (&ublior/BH) 
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Item bias 

. • ' • / 1 

* * . ^ ftbsferact 

This study discusses latent trait theory applications to test it:em bias ^ 
Methodology^ A real data set is used in describing the rational* and 
application of the Rasch probabalistic model item calibrations across various 
ethnic group populations- The mathematics are given altljough .their derivation 
is not described exciept when required for completeness. ^ Using UCON estimation 
procedures for item difficulties item plots for each ethnic group by the v 
^several tests available {Reading, Written 'Expression, Mathematics) were^ ~ 
pre|)ared. The de/ivation of acceptable tolerance limits is described an^ 
applied to the ctirrent data set wherein a bias item is revealed, ' » >^ 
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' , . ( Item bias. 

' LATENT TRAIT THEORY APPLICATIONS * , 



TO TEST ITEM .BIAS METHODOLOGY ' 



Unbiased student assessment on standardized tests currently in use is, a 

^quest fraught with confusion, misunderstandi^ng, and misinterpretationjLn the. 

current glass darkly debate oVer bias in mental t^sting^ Issues rais.ed are: 

1) predictive vali'djlty .for children from lainority group backgrounds "may be 

misrepresented •by the standardization and valid^ltion groups; 2) for' internal 

and' <2onstruct criteria of bias, statistical adjustments alone (viz., regression 

techniques or ANOVAs regardless of how^sophisticated) neither ^s s^p^rted by 

the ezDplrical d^ta available ilor will likely gain acceptance (outside of 

psychometric debate) for administrative, political, and legal argumetit^; and, 

3} as the cultural ethic changes from demands for equal opportunity' to 

expectations of undifferentiated outcomes the discussion of differential 

f 

validity and test bias will likely become*BX>re heated. Any resolution of ' 

these major points, of coulScsc, will hardly exhaust tAie argument (cf . Lord/ 

1971; Jensenr 1980; Scheuneman,, 1975); and^ as Fincher (1975) points out, the 

attitude of the federal couifts to deal with^ the consequence^ of unrecognized 
• J ■ 1 * 

^ or unapproached test item bias is itself ^ vlrtu^^l enigma. 

'Given the current contravectible environment of test bias,^,its detection 
emd cozfrec^ion, latent traiti^^tho^ology — andr specifically, the logistic 
response iqodel — of fers sqm^ i&ppeaLing avenues for investigation. It is^the 
intent of this study to further exploraticfn of latent trait theory applications 
to test item ^las methodology. Th^ techniques used, their r^tioi^ale and 
Utility as applied to the current data seJL^ is discussed. Detailed explanation 
of ^tent trait theory ^e described elsewhere (Haxnbleton, 1978; Lord, 1968; ^ 
Harm, 1978; Wright, /ig79?r)and are not repeated^ he r^. t ~ * 

It ls*dl|ficult to ignore the advantages latent trait theory offers over 

traditional psychometric methods in pursuing a tsdr^ consistent,* and workalile 

definition and approach ta the detection and correction of test item bias in 
' • * * ■ ^ * ' 

widely used standardized tests* Of particular interest is the statistical 
independence of persons anH tedt itema* The separate estimation of these 



/' . « . Item bias 

paraioete'fs >n th6 logistic respojase n^dei approach {and its matjieinatic^^ ) 
derivatd) provides an avenue* to avqid. dif fictife^ies inherent in conventional 
biased item detection techniques; yet, still satisfied is the criterion of 
a consistent definition of iteia bias. • 

Scheuneoan (1975)* proposed aliew operational definition which* we shall 
adopt ss con,ta4)ning sufficient rigoif and accuracy for present purposes: ''An. ^ 
item is considered unbiased if for persons with the same ability in the area ^ 
being measured, the probability t>f a cotrect' response on th^ item is the same 
regardless of the popsilation group ntemierehip o£ the individual." ^|This^ 
definition of bias is con^istant ^ith that used by; Green and .Draper (1972), ^ 
and Pine and Weiss (197fe) . Scheufteman's definition desicribing the interaction 
of an examinee A^ifth 4 particular i^em provides a utilitarian way of detecting 
item biasrTn the conteajt of,' but not dependent upon, examined performance. ^ 

^ - ^ ' ' ' • ' . ' r J 

The problem initially is one separating the parameters of person? and 
test it6ms. ^^e latent trai4: thedry does this neatly an^ simply by pi^Rposing 

, the model • * , * /• 

' * * . • . /' . ^ ' 

■ - ■ • * ■ 
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where - • P * probability ^f a ^correct response 

^ , • ^ • Q * probajsility of an incorrect response 

\ a - person ability « ^ 

, V ^ ^ 5 ^ itom difficulty 



or, by log^scale: ^ K ' ' ^ kI:^^ 
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and, subtracting the <fif£erence of logs from this ratio yields: 
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' ' When a set of data a** applied to the mathematical derivates of the model 

Xhe statistics are easily calculated. 'The essential point for the present' 
investigation is're^^zed ia their latently additive prop^ty. Hence, item 
difficuILt^" can be ^epat.ated ft^ persop ability. The methodology of ' item-free 
Bwasureroept continues on to describe precisely ho^ person free item difficul- 
ties aro- esbimatsd as well as 'item free parson abiliUes. These calculations , 
are described 'elsewhere te.g., Rasch, 1961; An ' , 



r 



$ 



"are described elUwhere {e.g., Anderson', 1973, 1977; Baker, 1977: HaiSbleton, 
1978: Rasch, "1961; Ryan, n'.d.j Wright,- 1977, 1979a,^ 1979b). 

To -the Rasch model, however, In a and In C are sin?>ly redefined as: 

0 = In a, and . " . 
5 » In C * 



Hejicejr, the derived equation, expressed probabilistically* is 



yhere a person (v) with a. defined ability, (B^) interacts to an item {!) with 

a calij^rated difficulty (6) to produce a response • 

' This is the onlii^ alternative to the models for a response tfiirve which 
allows for independent estimations of person ability and itei& difficulty. ^ 
Saysstoschs "V?hen the .estimators for and 6^ are derived/by maximizing a 
conditional likelihood they are unbiased, consistant, effiifcicnt and sufficient." 

Item characteristic curves are conput|d by .the regressibn of tTie test 

scores on ability 6 from a frequency, distribution of test scores for each f'ixed 

* ' * . / 

level of rfright (1979) graphs the ogive for the thcoretical^esponse cur^e 
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Insert Table 1 here 
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\j Th« Pasch Mcjjtel Logistic Response Curve 
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Pigiire from BAst Tefet Design by D, Wright and M. H, Stonq CMESA Press, 
1979) , Used with permission of. the. Author. 
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^ith the logistSrc response zoodel, at least three tests for validity are 
required-. Thg^e are: 1) the nbre^able a person the better the chance for 
success on aoy particular item; 2) any person has a J^etter chance, of correctly 
answering an easy item than a difficult one; and, 3) these conditions must 
be observably true regardless of. any person's race, sex, or other noninterf erlng 
characteristic. ' ' • * ' 

The third criterion is critical in test item bias methodology in that 
Scheuneman's definition of a blessed test item is satisfi^ and it implies tKe 
notion Qjf parameter indeperfience. . Herein lies the departAre of the J.ogis£ic 
response. model approach to biased item detection from traditional techniques. 
The states ti69 conventionally employed in the search for a biased item' are not 

■ \* 4 

independent of the sampl% ability distribution and are distorted for any 

• -V » 

independent analysis by this sample specific^ characteristic. Rudner (1980a*} 
reviews several of the cc^nmonly u^ed biased item detection techikques of 
enpirical evidence *of internal crCteria in test bias; and, Peterson (n.d.) 
.examines common arguments of bias m predictive validity.' Each of the techniques 
and strategies discussed, however, is linked directly to the sample ability 
distribution. Tucker (1946) argues that this characteristic is not one that ^ 

« can enhance test rigor but actually may confound its ovn intentions* Vtright 
(1^76) demonstrates the point by citing a term — "sonata" — with high ^ discrimina- 
tion indices and culturally sK^wed. , Thus, the critical component of separate 
parameter estimation is not sucffcess^fully addressed by any of the^more 

'traditionally used biased dtem detection statistics* 



y- * 



One further point isy important to note for the present investigation. 
Latent trait ttieory and th^ logistic response model assuit^s local, item 
independence.^ .That is the performance of any examinee on any particular test 
item is^n autonomous result of the interaction 6f pupil ability and item 
difficulty. The response by the examinee to that item is not influenced by a 
previous performance on any other, item in the test. lord (1953) demonstrated 
the validity of thitf assunption with a goodness-of *f it statistical test. 



item bias 
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Item, Bias Sthdy ' * • ' 

. The'-present st^dy of test item bias, was conducted in a large west coast , 
school dist^icj: and utilize^ a single' hi§h school graduation proficiency • 
examination. The test itselJE-Was developed! with items Selected from, variou? . 
recognized item banks along with Vfew new "situation-^pecifdc iten^^<7j»cessitated 
by the previoysly defined" test content specifications. 'The items were then 
'Rasch calibrated for goodness-of-f it of each to the model. Misfitting^oK 
ambiguously worded items were discarded. The calibration^ were conducted Vith 
UCON estimation procedures. " * ,, . - " 

It is to be notfed that* for" the Rasch model calibration a single parameter 
assumption was made. I^dner (1977, 1980b)* is critical of this assun^tion because 
ability in Rasch single parameter theo^ is based upon total score. Consequently 
the presence of biased items aggregated into a total score could yiel^ spurious 
results.' He recoaaaends adoptibn of the three-parameter model as developed by 
Bimbaum.. Thi& study ci^d^ not accept the suggestion of a three-parameter model. 
Albeit conceded that a degrifie of rigor is added b? the increased concern of ,an 
item discrimination index and a pseucto guessing paraaeter, .the increased 
complexity *as well as added difficulty of interpretation*we?e not warranted in ' 
.the present .ctrcumstance. 

'mm. ^ 

The high school graduation proficiency test was coKprlsed of three ^ 

sitt>tests:' reading comprehension, writing mechanics," and mathematlr*; . A 

writing saii5>re is aiso a required' portion of the complete high school gradua- - 

tion proficiency test but it was scored by a holistic process and scores were 

.not equated to' Rasch scaling; and thus, it was excluded f.rom the present study. 

For the subtests- included^ all questions were dichotomqusly scored multiple-' 

v. ' ' ' 

Choice questions. The reading test contained 30 items, the writing mechanics 

and mathematics" tests had 35 each. Each subtest was treated Independently of 

all others and ittem difficulty invariance was evaluated over each ability _ 

group for Black and White ethnic groups. Item plots for each ethnic group 

with a toti£l*gr«iup (i.e., all ethnic groups combined)* were also 'examined . . 

' ' ' 
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Jhe Sample Population 



\ 



The sample population in t)ie present applif ation of latent trait, theoicy 
to test item .bias pethodolo-gfy included 5,309 reading comprehension tests, 
5,284 tests of written expression, and S, 780 .mathematics tests. By ethnic , 
groups the* distribution was less equal nmerically although sufficient; within 



each to yield valid i^esults^ 



The ethnic gro;ap populations were't 1,0'42 wHite, 



"llyl4l Black, and 16,373 total group (including all Whites, all Blacks, and 
other unidentified) • Table 2 presents these ^ta as well as mean ability fitnd 
.standard deviatioA estisiates for each ethnit groijfp by subtest, - , 



Insert Table 2 about here 
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Table 2 



Ethnic Vjrfaup by Subtest: \ 
' Number, Me?ui >d)ility, ^nd Standard Deviation -Ability 



Item bias 
9 



Test ^ 


- ■ * Ethnic Source Group 


White* 


Black 


Total* 


I Ability.: 
N ' Tf and o 


b Abijity: 


1 Ability:, ' 
N } ^x and <J ' 


Heading 
, (30 items) ; 


■ 308i " 2.19 
^ ' .86 


3717 I 1j14 


1 ^ 
5309 , 1.21 ^ 

' .96 
I ■ 


t 

Written ^ 
Expression 
(35 items) 


384 1 .1.89 
' ' 1-.05 , 


3557 [ . .68 
' .92. 


5284 ! ,83'" 

^ ' ■ • 

' * 1.63 
' \ 

i r— ■ • 


Mathematics 
{35 items) 


.,350 1 2.45 
1 1.04 


4167 1 1.25 . 
1* 1.01 


1 . r • 

578a 1.40 , . 

1 r.i2 
1 


totaL 


1042 • ^ 


11,441 

^4= 


1,6373 ■ \ 



♦Includes total White, total Black, and other unidentified 
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Xtejn hiai^ 



t&m Difficulty Estimation^ 



For the best calibration persons should be about evenly distributed over 

^ the range of "^scores around and a^ove the center of the test (Wright, 1979a)| 

. The sample' person, ability distribution data were, coinputed; and, as the data 

reveal, scores are not symetrically distributed but nege^tively skewed around ^ 

^ and above a modol rav^score of 22 to 25 in reading compilehension, 19 to 26* 

' I *l • " . 

. in written expression^ and 29 to 30 in mathoraaticsfi. This result was 

anticipated due to the nature of item content Specifications for minimal 

high school graduation skills rather Jthan allowing for a range of abilities 

f rom fluite low to very high.'* /The frequency distribution tables for each 

subtest ar e incjluded in Appendix B.) ^ * ! 

Item Plots • . ' * ' ' 

I * - • 

nie constructed item plats for each of the items ! on every sxibtest allow 

i \ * 

inspection of the extent to which the item points conform to ^he model ^ 
expectation of item difficulty invariance. This inspection of item invariance 
across different ethnic groups is a laeas^re of the quality of indJi-vidual items 
\ to be free from or contaminated by some degree of bi^is. 

Each pair of calibrations applies to onfe, and only one, item, and of 
course, two difficulties (d^^ and d^^^ derived, jstandard errors for each 

item (s and s ) was also con^JUted.v Hence, only a jingle .translation is 
necessary to establish an-origin coLiruon to both sets* c^f items at any difficulty. 



Wright (1979a) gives the statistic fpr testing th^ estimate of by 



and dj^2* ^ 
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(d 



il 



<5^2)/(s,2^ + .8^|).:'»^mO,l) 
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• Tests for the quality of fit of eAch item (viz., itera.in^ariance across 
ethnic g'roii^p population calibrations) can be made b^ positioning quality . * , 
control boundaries at about two standard e^rrors .away* from an identity ' line on 
each side. Two of these" quality control boundaries para^lleX to th^ initial 
identa^y control' line approximate a 95% confidence boundary . This calculated 
bjf tKe form^a' . . 



>. 



D, ,^ is thfe^perpeiisJicular distance 'between the q|iia\ity cojitrol line and 
il2 ^ . ' ^ ' ^ ' 

' the identity line. . The formula (s^^ + Sj,^") . estimates th^ standard of the 

^ '< difference between the tvo independent estimates^ d^^nd half of thi^, or 
. •^[(s^l ^ ^12^ ^^J^' perpendicujaif^ to 4.45 degree Identity line. ' 

\^ These qfofttrol l:Lnes for identity plots may be graphically presented as 

follows. 



* 
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Insert Tablfe\3 about here 
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Table 3 ' 



Estimation of Quality Cqntrol Lines* 



Item bias ^ 



re* 



95% . - 
CALIBRATION 1 boundary 



IDEWITY 
LINE 




. 68% 
Boundary 



95%* . 
Boundary 



CALIBRATION i % * « 



1 » 



Pigiire from B6^ Test Design by B. D. Wright. and M. H. Stone (M^SA Press, 
' 19751. Oaed with pentdssion of the author. . 
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" Table 4 s^^^^^ts the saijplfe population groups by subtest upon which fit 
statistics were confuted. ' The. calibrations were UCOir estimations', (Tech-' 
nlcally# last difference change and JCon?>arisonq> with PROX procedures were also 
calculated* The tables for eafch ethnic group by subtest are^ included in 
Appendix A.) Table, 5 displays the scheipa of study design in which the nine 
item Idiots were constructed. ' \ - ^ 



Insert Table 4 arid Table' 5 aijout here 
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Table 4 



Sanqple Population Group by Subtest 



X 



* • 




\ 




No. , 


'Source 


0 

0 


subtest 
^ — , ^ 




' C . ^ White X Black 




Reading * ^ 




^; ' i White xVotal 




Heading 




"^S^. *j5lack X T^otal 




KeadCiig 


4 


, \ *^Vhit^ X Efltack 




Written Expression 


5 


Whit^ X Total ' 




' Writteij Expressipn ' 


, 7 


* " Bl^ick^ X Total 

* % fibite X Black 




Written Expression 
Mathematics 


8 ' . 


; ' * White x Total 




' Mathematics 


9 


' * ' Black X :Total 

i 




• 

Mathematics 
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pcheiaa of Stud/, Design 



Test Item 




riOO items 




1 35^ Written Expressidn 
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Ifata Analysis * , - ' ' 

^ / *. • ^ * t . 

An^ examination of each of the nine item plota reveals a remarkable degree 
of item invariance tpz each of the items on all si^tests. Two of the item ^ 
*plots afe displayed; Tabic 6 presents the^ subtest exl\ibiting the least item 
invariancGL tethnic group Black versus e'thnic group Totals Hathl and TabTe 
7 displays the item plot wherein item invariance wyond confidence limits is 
revealed (ethnic group White versus ethnic group ^lack- Written Expression). 
{The remaining item plots are included in Appendix C.) 



■ 
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Insert Table 6 and Talkie 7 about here 
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Tbe readilig subtest contained least'*item invariance ^or each ethnic group 
c'onparlson. The mathematics subtest al^o exhibited minimal ^tem invariance 
Respite the largest item difficulty Ange among the* three subtests t*2»339 to 
4.296 logits). • ^ 

Within* the writing subtest^ however,* a single item did exhibit an, unacceptable 

decree of i±eji| invariance. This item plotted for ethnic grouplwhite versus 

ethnic group Black outside of qualify control lines, and thus repres^efi^ed the 

detection of an item confounded by ethnic group calibration. The item, 

identified as Item Vo/ 4SX, calibrated at 1.327 logits difficulty and standard 

error of ^110 for ethnic gr6up White ^and nearly twice a^ large at 2.5(> logJUta' 

* * * * ^ • / - 

^if ficUlty and standard ^rror of .048 for ethnic group 51ack« 

The lar^e difference in item difficulty estiinates between calibration b^ 
^thjii.c ^roup White and ethnic gr^w Blac^ and the resultant outlier charact^eristjic 
on* the item plot p6lnt^d to an inspection of the item wording. The item read 
as follows f 



Insert Figure 1 about here - , ^ 
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, E^igure 1 



Item No, 461 



Select the word or groups -of words that correctly cocjpletes the sentence. 



The grasshoppers \h\ our garden 



the vegetables. 



/ 



A is eatjnj 
B. cats 
*C eat 
D ' does cat^ 



/ 
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• « 

•Curriculmn experts examined the items and surmised that^ the correct 
response C^^at ) be confounded Wy modest dialectical differences among 

ethnic groi^ Black examinees. '.^Traditional item statistics Support this^ ^ 

_^ • 

supposition- As revealed by 2r Values, ethnic group. Black examinees missed 

7 ' * * 

-the item' much more often than did ethnic group White examinees, ^ and response 
B(eats} was the most frequently selected, distractor by ethnic group Blatk. Not 
sui?l)rizingXy^ .analysis of variance revealed ^that while between group variance 
was large, ^ thin group variance was very small* Yet^ in total gtoup the item 
'held a^ftigh discrimination index (point biserial) . Thu^, in €Kis study pf 
item bias detection. It-is likely that this -particular item may have been *^ 
overl6oked in a search for biasiBd^items using. traditional Statistics; yet, 
with the logistic response mo^el of late^it /t"i^ theory, this defective ite^ 
was detected and removed ♦from tHe test. \ ^ % 
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